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Abstract — We carry out a comprehensive study of the resource 
cost of averaging consensus in wireless networks. Most previous 
approaches, such as gossip algorithms, suppose a graphical 
network, which abstracts away crucial features of the wireless 
medium, and measure resource consumption only in terms of 
the total number of transmissions required to achieve consensus. 
Under a path-loss dominated model, we study the resource re- 
quirements of consensus with respect to three metrics appropriate 
to the wireless medium: total transmit energy, elapsed time, and 
time-bandwidth product. First we characterize the performance 
of several popular gossip algorithms, showing that they may be 
order-optimal with respect to transmit energy but are strictly 
suboptimal with respect to elapsed time and time-bandwidth 
product. Further, we propose a new consensus scheme, termed 
hierarchical averaging, and show that it is nearly order-optimal 
with respect to all three metrics. Finally, we examine the effects 
of quantization, showing that hierarchical averaging provides a 
nearly order-optimal tradeoff between resource consumption and 
quantization error. 



I. Introduction 

Consider a network of N nodes, each of which possesses 
a scalar measurement z n (0) £ K. In averaging consensus, 
each node wishes to compute the average of these measure- 
ments: z avc = l/-W£nLi z n(fy- Averaging consensus often 
is described as a simple, canonical example of distributed 
signal processing over sensor networks; a common narrative 
is that each node measures the local temperature and wants 
to compute the average temperature over the sensor field. 
Such simplicity, however, is deceptive, as consensus lies at 
the heart of an array of sophisticated problems. It has seen 
use in load-balancing (TJ, in distributed optimization ||2)-||4|], 
and in distributed estimation and filtering |5 |, [6|. Algorithmic 
advances in consensus therefore represent advances throughout 
distributed signal processing. 

Due to their simplicity, flexibility, and robustness, gossip 
algorithms have emerged as a popular approach to consensus. 
In gossip, the network is modeled by a graph. Nodes iteratively 
pair with neighbors, exchange estimates, and average those 
estimates together, eventually converging on the true average. 
A large body of excellent work on gossip has been developed, 
from the early randomized gossip of Q to faster schemes such 
as path averaging [8| and multi-scale gossip [9|. Gossip is 



Matthew Nokleby and Behnaam Aazhang are with Rice University, 
Houston TX (email: {nokleby, aaz}@rice.edu). Waheed U. Bajwa is with 
Rutgers University, Piscataway, NJ (email: waheed.bajwa@ratgers.edu). 
Robert Calderbank is with Duke University, Durham, NC (email: 
robert.calderbank@duke.edu). Portions of this work were presented at The 
37th International Conference on Acoustics, Speech, and Signal Processing, 
Mar. 2012 and will be presented at the Asilomar Conference on Signals, 
Systems, and Computers in Nov. 2012. 



simple, requiring minimal processing and network knowledge, 
and it is robust, retaining performance even with failing links 
and changing topology. 

However, the purpose of consensus strategies is typically 
to facilitate processing over wireless networks, and wireless 
affords possibilities that existing strategies do not fully exploit. 
For example, in random geographic graphs, nodes are taken 
to be neighbors if they lie within a radius, usually chosen 
to be Q(yf\og N/N) following iflOl . Sophisticated consensus 
algorithms are then constructed in order to minimize the 
number of transmissions needed to achieve consensus. In 
wireless, however, transmit radius is adjustable, and given 
sufficient transmit power the network is fully connected. Since 
wireless transmissions are broadcast in nature, we therefore 
can trivially achieve consensus by having each node transmit 
once. This suggests both that wireless permits flexibility 
that may improve performance, and that we must consider 
additional performance metrics — such as transmit power — that 
encompass more than just the number of transmissions. 

Based on the preceding, we make several observations 
which motivate this work: 

• Consensus algorithms are designed over graphical net- 
works, which abstract away the broadcast and super- 
position nature of the wireless medium. In wireless, a 
single transmission arrives at multiple destinations, and 
multiple transmissions interfere at a single destination. 
By contrast, transmissions over graphical networks are 
usually assumed to be unilateral. 

• Graphical networks presuppose a fixed topology, whereas 
in wireless networks connectivity adjusted dynamically 
via the transmit power. 

• The performance of consensus algorithms typically is 
measured with respect to the total number of trans- 
missions. In wireless, however, multiple resources are 
expended — namely time, bandwidth, and energy — which 
must be taken into consideration. 

While several of these issues have been taken up individ- 
ually, in this work we consider them jointly to answer the 
following question: What are the resource demands of con- 
sensus over wireless networks? Our objective is to expand the 
framework in which consensus is studied in order to account 
for and exploit features of the wireless medium that previously 
have been overlooked. Given the ability to broadcast and to 
adjust connectivity dynamically, we seek fundamental limits 
on the wireless resources required to achieve consensus, as 
well as practical consensus strategies that attain those limits. 
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A. Contributions 

First, in Section [TT] we define a realistic but tractable frame- 
work in which to study the resource demands of consensus. 
We assume a path-loss dominated propagation model in which 
connectivity is determined by a signal-to-noise ratio (SNR) 
threshold. At first we suppose connected links are perfect and 
have infinite capacity. We define three resource metrics: the 
total energy expended in order to achieve consensus, the total 
time elapsed, and the time-bandwidth product consumed. In 
Section [III] we derive lower bounds on the required resources, 
and in Section [TVl we characterize several existing consensus 
strategies within our framework. We show that while path 
averaging is nearly order optimal with respect to energy 
expenditure, it remains suboptimal with respect to elapsed time 
and consumed time-bandwidth product. 

Next, in Section [VI we propose a newQ consensus algorithm, 
termed hierarchical averaging, designed specifically for wire- 
less networks. Instead of communicating with neighbors over 
a graph, nodes broadcast estimates to geographically-defined 
clusters. These clusters expand as consensus proceeds, which 
is enabled by adjusting nodes' transmit power. Much like 
the hierarchical cooperation of [121 and the multiscale gossip 
of (9), small clusters cooperatively broadcast information to 
larger clusters, continuing until consensus is achieved. De- 
pending on the particulars of the channel model, hierarchical 
averaging is nearly order optimal. When channel phases are 
fixed and identical, hierarchical averaging is order optimal 
with respect to all three metrics simultaneously, up to an 
arbitrarily small gap in the exponent, for path-loss exponents 
2 < a < 4. In the more realistic case in which phases are 
random and independent, however, hierarchical averaging is no 
longer order optimal in transmit energy when a > 2, although 
it remains order optimal with respect to the other two metrics. 

Finally, in Section [Vj] we incorporate quantization into 
our model. Since practical wireless links suffer from noise, 
achievable rates are finite and estimates must be quantized 
prior to transmission. This introduces a tradeoff: expending 
more energy increases the rate of the links, thereby reducing 
the quantization error inherent to each transmission and there- 
fore the estimation error accrued during consensus. Therefore, 
in addition to the resource metrics of energy, time, and 
bandwidth, we introduce a fourth performance metric: mean- 
square error of the consensus estimates. Again we characterize 
existing consensus techniques. We also apply quantization to 
hierarchical averaging, showing that it permits an efficient 
tradeoff between energy and estimation error. 

B. Prior Work 

We detail a few works that are relevant to the present study. 
For a comprehensive examination of consensus and gossip, see 
the excellent survey [131 of Dimakis et al. 

Consensus has been studied under various guises, including 
the early work of Tsitsiklis (2|> who examined averaging in the 
context of distributed estimation. Recent interest in consensus 
was sparked by the introduction of randomized gossip Q, 

'An early version of hierarchical averaging was presented in 1111 . 



which defined the framework and developed the theoretical 
machinery in which most subsequent works have operated. 
Randomized gossip, however, has relatively slow convergence 
on random graphs, requiring roughly <d(N 2 ) transmissions]! 
Since then, researchers have searched for faster consensus 
algorithms. In geographic gossip [14], nodes pair up with 
geographically distant nodes, exchanging estimates via multi- 
hop routing. The extra complexity garners faster convergence; 
geographic gossip requires roughly 9(A f3 / 2 ) transmissions. 
Geographic gossip was further refined by the introduction of 
path averaging |8|, in which routing nodes contribute their 
own estimates "along the way." Path averaging closes the gap 
to order optimality, requiring roughly Q(N) transmissions, 
which is the minimum of any consensus algorithm. 

A few works have addressed individually the wireless 
aspects we consider in this work. The broadcast nature of 
wireless is considered in [15], fl6l ; however, in these works 
broadcast does not significantly improve performance over 
randomized gossip. Multi-access interference is addressed — 
and in fact exploited — in [171 . where lattice codes are used 
to compute sums of estimates "over the air." The notion that 
network connectivity can be adjusted via power allocation is 
explored in [181 , and in [191 the optimum graphical structure 
for consensus is derived. 

Finally, many authors have studied the impact of noisy 
links on consensus. In |20|, continuous-valued estimates are 
corrupted by zero-mean additive noise, and optimal linear 
consensus strategies are derived. For a similar model, the bias- 
variance dilemma is identified: running consensus for longer 
reduces the bias of the resulting estimates, but it increases the 
variance. Algorithms that resolve the dilemma are presented, 
but they suffer from slow convergence. In [21], [22ll quantized 
consensus algorithms are presented that achieve consensus 
while passing finite-alphabet estimates. In [231 traditional 
gossip algorithms are augmented with dithered quantization 
and are shown to achieve consensus on the true average in 
expectation. In [24| the increasing correlation among estimates 
is exploited to construct a consensus algorithm employing 
Wyner-Ziv style coding with side information. 

II. Preliminaries 

A. System Model 

In defining the wireless model, we aim for a balance 
between tractability and practicality. To this end we make 
four critical assumptions, which we contend capture the salient 
features of wireless while maintaining simplicity: synchronous 
transmission, path-loss propagation, "protocol"-model connec- 
tivity, and orthogonalized interference management. In this 
subsection we detail and justify our assumptions. 

Although consensus algorithms are occasionally defined 
over synchronous models (e.g., the synchronized gossip from 
[7]), researchers more commonly assume communications to 
be asynchronous. Each node has an independent clock that 

throughout this paper we use the Landau notation: f(n) = 0(g(n)) 
implies /(n) < kg(n), /(n) = Q(g(n)) implies f(n) > kg(n), and 
f(n) = e(g(n)) implies f(n) = 0(g(n)) and f(n) = n(g(n)), all for 
constant k and sufficiently large n. 
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"ticks" at Poisson-distributed intervals; upon each clock tick 
the node initiates a round of consensus, which is assumed to 
take place instantaneously. This model is an idealized version 
of ALOHA-style protocols, and it sidesteps the scheduling and 
interference difficulties inherent to wireless communications. 
Our goal, however, is both to characterize the best possible 
performance under wireless and to address interference. We 
therefore adopt a synchronous model in which nodes transmit 
simultaneously in slotted time. In practice, near-perfect syn- 
chronization can be achieved via beacons, as in superframes 
for 802.15, or via GPS clocks. Formally, let x n (t) denote the 
signal transmitted by node n, and let P n (t) = \x n {t)\ 2 denote 
the transmit power, during time slot t. 

We suppose a path-loss propagation model. Each node n 
has a geographic location r„ £ [0, 1] x [0, 1] in the unit 
square, which we take to be independently drawn from a 
uniform distribution. Under the path-loss model, the channel 
gain between any two nodes m, n is 

fomn — & || '-'nil 2 ' 

where a > 2 is the path-loss exponent, and 6 mn £ [0, 2ir) is 
a random phase with unspecified distribution. 

We further suppose a "protocol" connectivity model. We 
say that the signal x m (t) arrives at node n provided the 
received power is above an arbitrary threshold 7. Define 
the neighborhood of node n as the set of nodes whose 
transmissions have sufficient received power: 

N n {t) = {m : P m {t)\h mn \ 2 > 7 } 
= {m:P m (t) >7||r. 



Tnll^}- 



(2) 



For nodes m ^ A/" n (t), we assume that node n suffers no 
interference from node m's transmission. This assumption 
permits a tractable, geometric analysis of connectivity. 

In hierarchical averaging, which we present in Section [V] 
we group nodes into clusters which transmit cooperatively. In 
this case we must expand the definition of neighborhoods to 
characterize the number of unique signals arriving at node n. 
Let C C {1, ••■ , N} denote a cluster of nodes transmitting 
the signal Xc(t). Define the received power at node n as 



Re, nit) — E 



where the expectation is taken over the random phases. Then, 
the neighborhood of n is the set of all clusters C such that the 
received power exceeds 7: 



M n {t) 



C : E 



2_j hinn^/~Pm(J>) 



m£C 



>7 



The connectivity of clusters depends on the distribution of the 
phases mn . In the sequel we consider two choices. First, we 
consider the simple case in which the phases are equal and 
fixed. In this case, signals constructively combine at receivers, 



and the neighborhood of n can be written as 

Mn{t) = |c : hrnnPht)^ > 7 \ . (3) 

The second, and more realistic, case we consider is that 
each mn is independently and uniformly distributed across 
[0, 27r). In this case signals do not combine coherently, and 
the neighborhood of n is 



C : h 2 mn P m (t) > 7 



(4) 



Our final assumption is a simple orthogonalized approach 
to interference management. For every m £ Af n (t), node n 
receives the following signal: 



(5) 



where w mn (t) is unit-variance Gaussian noise. In other words, 
incoming signals arrive independently and do not interfere. 
In order to avoid such interference, incoming transmissions 
must arrive on orthogonal sub-channels. We leave the nature 
of the sub-channels unspecified; they may be realized in time, 
frequency, or code. In order for nodes to orthogonalize, there 
must be 

B(t) =maxpV„(*)| (6) 

n 

sub-channels available during time slot t. We do not worry 
about the specific allocation of nodes to sub-channels. In ad- 
dition to distributed techniques such as asynchronous CDMA, 
there exist distributed graph coloring algorithms l25l that 
achieve an order-optimal allocation of sub-channels to signals. 

B. Performance Metrics: Infinite-Rate Links 

We first consider the case in which the links between 
neighboring nodes are perfect; that is, at time t node n decodes 
a real-valued scalar from each m £ Af n (t). This is obviously a 
simplification, since wireless links are necessarily rate-limited. 
However, most gossip algorithms are founded on the ability 
to exchange values with infinite precision, and we will make 
this assumption in the first part of this work. Later we will 
assume finite-rate links, which will necessitate a different set 
of metrics. 

The first figure of merit under consideration is the e- 
averaging time. During each time slot t, nodes exchange esti- 
mates with neighbors and update their estimates accordingly. 
The e-averaging time, denoted T e , is the number of time slots 
required to achieve consensus to within a specified tolerance: 



T e = sup inf <{ t : Pr 

z(0)6K" 



\z(t) 



|z(0)|| 



> e < e 



(7) 



where z(t) is the vector of estimates z n (t). The scaling 
law of T e is the primary focus of study for most gossip 
algorithms. However, it provides only a partial measure of 
resource consumption in wireless networks, so we consider 
other resources as well. 

We next examine energy, which is scarce in networks 
composed of cheap, battery-powered nodes. Define the total 
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transmit energy as the energy required to achieve consensus 
to within the tolerance e: 



N T t 



(8) 



Supposing each time slot to be of equal length, the transmit 
power P n (t) is proportional to the energy consumed by node 
n over slot t. Summing over nodes and time slots yields the 
total energy consumed. 

The final figure of merit is the time-bandwidth product, 
defined as 

B e = Y]B(t)=J]max\Af n (t)\. (9) 



t = l 



4=1 



The metric B e gives the total number of sub-channel uses 
required to achieve consensus to tolerance e. As mentioned 
previously, we leave unspecified whether the sub-channels are 
realized in time, frequency, or code. However T e represents 
the temporal component of the time-bandwidth product. The 
sequential nature of consensus dictates that T c rounds occur 
in succession. Therefore T e characterizes a constraint on the 
realization of the time-bandwidth product. All of the time- 
bandwidth product may be realized with temporal resources, 
but only a fraction of it may be realized by frequency 
resources. 

C. Performance Metrics: Finite-Rate Links 

In practice, wireless links are noisy and therefore have 
finite rate, which precludes the infinite-precision exchange 
of scalars. Instead, nodes must quantize their estimates to a 
finite alphabet prior to each round of consensus. To simplify 
the discussion, we suppose that the measurements z n (0) are 
drawn from the finite interval [0, 1). Throughout this paper, 
we employ dithered uniform quantization described in 
The quantization alphabet Z is defined as 

1 2 L 



Z = 



L + l' L + l''"' L + l 



for some alphabet size L. The quantizer is defined as 



mm \ q 



\q- {z + u) 



(10) 



(11) 



where u is a dither, drawn uniformly and randomly from 
[— A/2, A/2) each time <fi is called. Statistically, we can write 
the quantized value as 

4>{z) = z + v, 

where v is uniformly distributed across [— A/2, A/2) and 
independent of z. 

The alphabet size L — \Z\ depends on the quality of 
the wireless links. Since we define connectivity at signal- 
to-noise threshold 7, we take L to be determined by the 
Shannon capacity of a wireless link at SNR 7. Supposing 
unit bandwidth and block duration, we successfully exchange 
log 2 (l + 7) bits over the wireless links |26l . which results in 
an alphabet size of L = [2 l °S2(i+~/) j = |_i + 7 j. 

With quantization it becomes difficult to speak of conver- 
gence time. For a large class of consensus algorithms, the 



dynamics does not converge on the true average to within any 
finite tolerance, precluding our defining T e as before. In fact, 
quantization induces a tradeoff between resource consumption 
and estimate quality. 

For a consensus algorithm with quantization, let T be the 
number of rounds for which we choose consensus to run. Then 
let B and E be the time-bandwidth product and total transmit 
energy, defined as before but with T taking the role of T e . 
Finally, define the mean squared error as 



a = max E 
z(o)e[o,i) JV 



1 N 

n=l 



(12) 



where the expectation is taken over any randomness in the 
quantization operator as well as in the consensus algorithm. 
There is an inherent tradeoff between the total transmit energy 
E and the mean-squared error a 2 ; we can always reduce the 
estimation error by injecting more transmit energy into the 
network and increasing the rate of the wireless links. 

Finally, throughout this paper we will rely on the following 
lemma, which shows that the number of nodes in a region is 
asymptotically proportional to its area to within an arbitrary 
tolerance 8. 

Lemma 1 (Ozgur-Leveque-Tse, 4721/ ): Let A C [0,1] x 
[0, 1] be a region inside the unit square having area \A\, and 
let C = {n : r n g A} be the nodes lying in A. Then, for any 

5 > 0, 

(l-6)\A\N <\C\<(l + 6)\A\N, (13) 

with probability greater than 1 — l/| J 4|e _r ' (5 - ) ' yl ' Ar , where 
r(<5) > and is independent of N and \A\. 



III. Inner Bounds 

In this section we derive inner bounds on the resource costs 
for consensus over the proposed wireless model. We begin 
with the case of infinite-rate links. 

Theorem 1: For any consensus algorithm, we have, with 
probability approaching 1 as N —> 00: 

B t = 

Q( N l-<*/2^ 



E, 



(14) 
(15) 



Proof: The bounds on T t and B t are trivial. To prove the 
bound on E e , we observe that every node n must transmit its 
measurement z n (0) to at least one of its neighbors. The energy 
required for each node to transmit to its nearest neighbor can 
be expressed as 



N 

E f > > min 7/1 

~ ^ rn^n ' 
n=l r 



2 

mil 



N 
n=l 



(16) 



where d m i n (n) is the distance between node n and its nearest 
neighbor. It is well-known (e.g., in l27l ). that d m i n (^) = 



5 



8(jV -1 / 2 ) with high probability, so 



N 



£ £ >7E e (^ a/2 ) 



l— a/2 



)• 



(17) 



In the case of consensus with rate-limited links, we can 
derive an inner bound on the tradeoff between resources and 
estimation error. 

Theorem 2: For any consensus algorithm with rate-limited 
links, any achievable tradeoff in performance metrics satisfies 
the following with high probability: 



T = B = Q{1) 



N 



71=1 



(18) 
(19) 

(20) 



for x n > 0. In particular, choosing each x n = x yields 

E = n(N 1+x - a/2 ) (21) 
a 2 = n{N~ 2x ). (22) 

Proof: As in the ideal-link case, the bounds on T and B 
are trivial. To bound the tradeoff between energy and estima- 
tion error, momentarily consider a single node n. Suppose a 
genie supplies node n with z ave , and further suppose that only 
node n's nearest neighbor, denoted by m, needs to compute 
the average. In this case, the optimal strategy is for n to 
quantize z avc and transmit it directly to node m. In principle, 
other nodes could transmit their measurements to m, but since 
they are no closer order-wise, and since they have only partial 
knowledge of the average, any energy they expend would be 
better used by node n. 

Without loss of generality, let P n = ]\f x ™~ a / 2 denote 
the transmit power used by node n to transmit z avc . Since 
again the distance between nearest neig hbors is 9(iV- 1 / 2 ) 
with high probability, the size of the quantization alphabet 
is L = Q(N Xn ). Therefore, the square quantization error at 
node n on z avc is |e„| 2 = Q(L~ 2 ) = Q(N~ 2x "). Repeating 
the argument for each n gives the result. ■ 

IV. Gossip Algorithms 

In this section we characterize existing gossip algorithms 
with respect to the metrics defined in Section HI] There are, of 
course, too many instantiations of gossip for us to analyze 
every one, so we focus on two variant^] that we claim 
give a relatively comprehensive look at the state of the art: 
Randomized gossip [7|, which is probably the best-known 
approach to gossip, and path averaging [8|, which is order 
optimal in terms of convergence speed. Our first task is to 

3 Due to its similarity with hierarchical averaging, we might suspect that 
multiscale gossip (9) has superior performance to the gossip algorithms 
studied here with respect to our metrics. While we do not carry out the 
analysis here due to space constraints, one can show that the performance 
of multi-scale gossip is very similar to that of path-averaging. 



adapt the graphical nature of gossip to our wireless model. 
The key criterion is the required transmit power. In order to 
achieve consensus, it is necessary to choose the topology of the 
network such that the resulting graph is connected. In [ 10 1 it is 
shown that, with high probability, a necessary and sufficient 
condition for connectedness is that each node be connected 
to every node within a radius of Q(^logN/N). In terms of 
the neighborhood Af n (t), we require, for every node n that 
transmits during time slot t, that 



Af n (t) = {m : \\xm - x»|| a < ®(V^gN/N)}. 
By (fJJ, we must have 

P n (t) = ir a = 6( 7 (log N/N) a / 2 ), (23) 

for every node n transmitting during time slot t. This holds 
for both gossip algorithms considered in this section. 

A. Randomized Gossip 

We study the synchronized randomized gossip of Q. At 
each time slot t, each node is randomly paired up with one of 
its neighbors. Paired nodes exchange estimates and average the 
estimates together, which results in the following dynamics: 

z( t ) = i(W(t)+I)z(t-l), 

where W(i) is a randomly-chosen permutation matrix such 
that w mn = 1 only if nodes m and n are neighbors. 

In 17] Theorem 9] the convergence of randomized gossip is 
characterized. It is shown that the averaging time satisfies 



T, = JV 



- loge 1 
logN 



(24) 



Using these facts, we can derive statements about the perfor- 
mance of randomized gossip with respect to our metrics. 

Theorem 3: For randomized gossip, the resource consump 
tion scales as follows with high probability: 



log N • • t25) 
S e = e(^loge~ 1 ), (26) 
E t = 6(iV 2 - Q/2 (logiV) Q/2 - 1 loge" 1 ). (27) 

Proof: The bound on T e follows from d24l >. Since every 
node transmits during every time slot t, and since we take 7 
to be constant, for each t we have P n (t) = 6 ((log N/N) a / 2 ). 
Therefore we have 



T E N 



t=l n=l 



log N 
N 



>/2 



= T e e(A rl " a/2 (log/V r ) a/2 ) 

= e(A^ 2 - Q/2 (logAr) Q/2 - 1 loge- 1 ). 

Next, the required connectivity radius means that each neigh- 
borhood is defined by a region of area (log AT/TV). By 
Lemma Q] each neighborhood size satisfies 

(1 - (5)7rlogA < \Nn(t)\ < (1 + (5)vrlogA 

\M n (t)\ = e(io g N) 
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with high probability. Plugging this into © gives us 

B c =T e Q (log N) 
= 6 (TV log e" 1 ) . 



B. Path Averaging 

Next we look at path averaging, a more sophisticated gossip 
algorithm proposed in (8|. Instead of exchanging estimates 
with a neighbor, in path averaging each node chooses a 
geographically distant node with which to exchange; the 
exchange is facilitated by multi-hop rounding. In addition to 
facilitating the exchange, the routing nodes add their estimates 
"along the way," allowing many nodes to average together in 
a single round. Once the average of all the nodes' estimates 
is computed at the destination, the result is routed back to the 
source. 

Path averaging is described in an asynchronous framework 
in which nodes independently "wake up," initiate multi-hop 
exchanges, and return to idle state sufficiently quickly that no 
two exchanges overlap in time. Placing path averaging into 
our synchronous framework, we suppose that at time t a pair 
of nodes n, to is randomly selected to engage in a multi-hop 
exchange. Letting V{t) be the set of nodes routing from n to 
to, we suppose that the 2(|'P(t)| — 1) transmissions required 
to route from n to to and back happen sequentially and thus 
require 2(\V(t)\-l) time slots. At time slot t + 2(\V(t)\-l), 
a new pair is chosen. The dynamics for path averaging has the 
following form: 



Zn (t + 2(\V(t)\-l)) = Sm^rne^ 



otherwise 
(28) 

In J8] Theorem 2] it is shown that, for a random uni- 
form networlfl the expected path length is ^[^(i)!] = 
Q(y/ N / log N) and the number of exchanges required to 
achieve e-consensus is Q(^/N log N log e _1 ). Combining 
these facts, the total number of required transmissions is 
9(A^loge- 1 ). 

In casting path averaging in our synchronous framework, we 
have retained the assumption that multi-hop exchanges do not 
overlap in time. In principle one could construct a synchronous 
path-averaging gossip in which multiple exchanges occur 
simultaneously, perhaps reducing the total amount of time 
required to achieve consensus. In the following theorem, we 
provide a rather optimistic bound on the resource consumption 
of any such synchronous formulation. 

Theorem 4: For any synchronous path-averaging gossip, the 
resource consumption scales as follows with high probability: 



T e = B e = n 



N 



logiV J 
E € = 6(A^ 1 - a/2 loge- 1 ). 



(29) 
(30) 



4 Technically, the convergence speed of path averaging is proven over a 
torus, so the results we prove in the sequel apply to the torus. Later we show 
numerical results that establish empirically that the same results apply to a 
square network. 



Proof: We prove the bound on T £ and B e by noting that 
each route has ®(\ A Vjy ) hops. Even in the ideal case in 



which every round of gossip occurs simultaneously, we still 
require T e = Cl(-JN/ logN) sequential transmissions. As- 
suming that constant bandwidth is sufficient to accommodate 
the multiple exchanges, the same bound applies to B e . 

To bound E e we point out that, as with randomized gossip, 
we require P n (t) = 0((log N/N) a / 2 ) for every transmission. 
Since path-averaging requires 0(A^loge _1 ) transmissions, the 
overall energy consumption scales as 

E c = Q{N 1 - a ' 2 {logN) a ' 2 log€- 1 ). (31) 



V. Hierarchical averaging 

In this section we present hierarchical averaging. Much like 
multi-scale gossip [9| and the hierarchical cooperation of fl2l . 
in hierarchical averaging we recursively partition the network 
into geographically defined clusters. Each cluster achieves 
internal consensus by mutually broadcasting estimates. Nodes 
within a cluster then cooperatively broadcast their identical 
estimates to neighboring clusters at the next level. The process 
continues until the entire network achieves consensus. In the 
following subsection we describe the recursive partition, after 
which we describe the algorithm in detail and characterize its 
resource requirements. 

A. Hierarchical Partitioning 

We partition the network into T sub-network layers, one 
for each round of consensus, as depicted in Figure [T] At 
the top layer, which corresponds to the final round t = T 
of consensus, there is a single cell. At the next-highest level 
t = T — 1, we divide the network into four equal-area square 
cells. Continuing, we recursively divide each cell into four 
smaller cells until the lowest layer t = 1, which corresponds 
to the first round of consensus. At each level t there are 4 T ~* 
cells, formally defined as 

C jk (t) = {n : r e [0'-l)2*- T , j2*' T )x p-l)2*" T , fc2^ T )}, 

(32) 

where 1 < j, k < 2 T ~ t index the geographical location of the 
cell. 

Let C(n, t) denote the unique cell at layer t containing node 
n. Using the Pythagorean theorem, we can easily bound the 
maximum distance between any two nodes: 



M(t) = V2 ■ 4^ = 6(4^), 



(33) 



where the maximum is achieved when two nodes lie on 
opposite corners of the cell. 

We take T = |~log 4 (A^ 1 ~ K )~|, where k > is a small 
constant. In the following lemma we prove the order-wise 
cardinality of each cell, which we will use in deriving the 
resource consumption of hierarchical averaging. 

Lemma 2: For every 

1 < j, k < 2 T - t and 1 < t < T, we 

simultaneously have 



\C jk (t)\ = 8(4*iV K ) 



(34) 
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Fig. 1. Hierarchical partition of the network. Each square cell is divided into four smaller cells, which are each divided into four smaller cells, and so on. 



with probability greater than 1 - 7V 2 - 2k /16 . e -r(8)N K _ 

Proof: The area of each cell at layer £ = 1 is, by 
construction 

A = 

4 l-log 4 (/Y 1 - t ) + l < A< 4 l-log 4 (Af 1 -' t ) 

JV^ 1 < A < 4N K -\ 

Then, by Lemma [T] the cardinality of each cell at layer t = 1 
is bounded by 

(1 - 5)N K < \C jk (t)\ < (1 + S)4N K , (35) 

with probability greater than 1 - A^ 1_K /16 • e^ 5 ^". 

Define Ejk(l) as the event in which |Cjfc(l)| is outside the 
bounds specified in ([35]). Clearly Pi{E jk {l)} < /Y 1 - K /16 • 
e -r{S)N Therefore, by the union bound, we have 

Pr( |J E j>k (l) ) < E Nl ~*/ 16 -e~ r{S)NK 

\l<j,k<2 T - 1 J l<j,k<2 T - 1 

(36) 

< N 2 - 2 yi6 ■ e - r ^ NK -> 0. 

(37) 

Therefore, every cell at t = 1 simultaneously satisfies 
|C jfc (l)| = Q(N K ) with the desired probability. Now, since 
each cell at layer t is composed of 4*" 1 cells at layer 1, we 
have with the same probability 



\C jk (t)\ = e(4 t vY" 



(38) 



B. Algorithm Description 

Here we lay out the details of hierarchical averaging. We 
suppose that each node n knows the following information 
about the network: the total number of nodes N, its own 
location r„, and the number of layers T. 

First, at time slot t = 1 each node broadcasts its initial 
estimate z n (0) to each member of its cluster C(n,t). In order 
to ensure that n E M m (t) for every m € C(n, 1), each node 



transmits at power 

P n (l) = 7 max h a nm < 7 M(1) Q = 0{N^ a ' 2 ). (39) 

ra6C(n,l) 



Each node n takes a weighted average of the estimates in its 
cluster: 



mSC(n,l) 

We use the approximate normalization factor l/4 1_T iV in- 
stead of the exact factor l/\C(n, 1)| so that nodes at higher 
levels of the hierarchy need not know the cardinality of the 
cells. As we shall see, this approximation introduces no error 
into the final estimate. 

After time slot t = 1, each node in each cluster Cj k (l) 
has the same estimate, which we denote by z Cjk ^(l). At 
each subsequent time slot 2 < t < T, each cluster C(n,t — 
1) cooperatively transmits its estimate to its parent cluster at 
layer t. We take each P n (t) to be a constant. The transmit 
power required depends on the phase of the channel gains, 
as discussed in Section Hl-AI When the phases are fixed and 
identical, we need, by (O, 



Pm(t) 



< 



,raeC(n,t-l) 



Em£C(n,(-l) h mn 



|C(M-1)I 2 



O 



j±2t ]\j2k 

O (4( Q /2-2)tjV-a/2+K(a/2-2)^j 



(41) 



When the phases are random and uniform, on the other hand, 
by (|4j, we need 

E h2 mnP m {t)=l 
m£C(ni-l) 

P m {t) 



E 



h 2 



8 



< 



\C(n,t-l)\ 
= O ^A {a/2 ' 1)t N {K ^ 1)a/2 ^ . (42) 

After receiving estimates from the other sub-clusters, each 
node updates its estimate by taking the sum: 

1 

4 2^ Z C(n.t-l) 



Zn{t) 



E 

C(n,t-l)cC(n,i) 
meC(n,t) 



where the second equality follows straightforwardly by in- 
duction. At time t, the identical estimate at each cluster is 
a weighted average of the measurements from within that 
cluster. 

Consensus is achieved at round T, where the four sub- 
clusters at level t = T — 1 broadcast their estimates to the 
entire network. Evaluating (|43T > for t = T, we observe that 
hierarchical averaging achieves perfect consensus; there is no 
need for a tolerance parameter e. This somewhat surprising 
result is the consequence of combining the flexibility of 
wireless, which allows us to adjust the network connectivity 
at will, with the simplifying assumption of infinite-rate links. 
In the next section we will revisit this assumption. 

In the following theorem we derive the resource require- 
ments of hierarchical averaging. 

Theorem 5: With high probability, the resource consump- 
tion of hierarchical averaging scales according to 



T e = B e = 0(N K ), 

_ jo(N 1 ~ a / 2+Ka / 2 ), for fixed phase 
}()(N Ka / 2 ), for uniform phase 



(43) 
(44) 



for any path-loss exponent 2 < a < 4, for any e > and for 
any k > 0. 

Proof: The bound on T e follows by construction; we 
chose T — [log 4 A^ 1_K ] = 0(N K ) layers of hierarchy and 
constructed the algorithm such that consensus is achieved to 
within any tolerance e > 0, 

We derive the bound on B e by examining the cardinality of 
the neighborhoods for each node. At time slot t = 1, by (|39l l 
each node transmits at power P n {l) = 0(A^ k_1)q/2 ). The 
neighborhood size of each node therefore scales as the number 
of nodes in a circle of radius 0(N fi ^ 1 ). By Lemma [T] this 
number is |A/" n (l)| = 0(N K ) with probability approaching 1 
as N -)• oo. Thus 5(1) = 0(N K ). 

For rounds 2 < t < T, we need to bound the number of 
clusters in range of each node. In (l4Tb we chose the transmit 
powers such that the clusters transmit to each node in a circle 
of area nM 2 (t) = 0(4 t N 1 ^ ti ). By construction, each cluster 
C(n,t) covers an area of 0(4*A^ 1_K ). Therefore, the number 
of clusters that can fit into the circle is constant, so we have 
B(t) = 0(1). Summing over all rounds, we get 



B e = Bit) = 0(N K ) + Oi 1 ) = 0(N K ) 



(45) 



t=2 



have, from d39]l and (|4TT >. 

T N 



i 

= N ■ 0{N {K ~ 1)a/2 ) +N^2<J ^4( Q /2-2)*7v- Q / 2 +' i ( Q / 2 - 2 

n=2 

(T-l 
N l-a/2+K(a/2-2) 4 (a/2-2)i 
t=0 

/ 1 _ d(a/2~2)T 

= 0( tv 1 - q / 2+kq / 2 ) + O [ tv 1 ~ q/2+k(q/2_2) - 



I _ 4Q/2-2 

(46) 

= 0{N 1 - a/2+Ka/2 ) + O ^7v 1 -"/ 2 + K ("/ 2 - 2 )7v (a/2 ~ 2)(1 ~ K 

(47) 

= 0(jvi-«/2+K«/2) + o (N- 1 ) 

= 0(N 1 - a / 2+Ka / 2 ), (48) 

where d46b follows from the finite geometric sum identity, and 
( l47b holds only when a < 4. For uniform phase, we first note 
that the condition in d42l is more strict than that of d39l l. so 
we have 

P n (t) = O ^/2-l)t N ( K -l) a /2^ (49) 

for all n, t. Substituting d49l ) into the definition of E e , we get 

T N 

£e=EE P «W < 5 °) 



T-l 



Finally, we derive the bounds on E e . For fixed phase, we 



= O ^N 1+{K - 1)a/2 4 (Q/2_1) * J (51) 

= O ^i-«/ 2 +««/ 2 (iv«/ 2 -i _ i)^ (52) 
= 0{N Ka / 2 ), (53) 

where again we have employed the finite geometric sum 
identity. ■ 
Hierarchical averaging achieves resource scaling arbitrarily 
close to the lower bound of Theorem Q] when phase is fixed. 
When phase is uniform, however, the energy consumption is 
strictly suboptimal for a > 2. Note that the resource scaling 
does not depend on the channel phases for a — 2. For 
free-space propagation, hierarchical averaging is order optimal 
regardless of phase. 

C. Numerical Results 

We examine the empirical performance of the several con- 
sensus algorithms presented. Choosing 7 = lOdB, a = 4, 
e = 10~ 4 , and k — 0, we let N run from 10 to 1000, averaging 
performance over 20 random initializations for each value of 
N. In Figure [2] we display the average transmit energy E e and 
time -bandwidth product B t . (Since the data for T £ are rather 
similar to that of B e , we do not plot them.) 

With respect to time-bandwidth product, hierarchical av- 
eraging performs best, the required number of sub-channel 
uses growing slowly with N. The remaining two consensus 
schemes perform comparably, the required number of sub- 
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Fig. 2. Transmit energy i? e and time-bandwidth product B c for a variety of 
consensus algorithms. 



channels growing approximately linearly in N. Note that, 
while we bounded the time-bandwidth product of path av- 
eraging with a strictly sub-linear term, this bound applied to 
hypothetical instantiations of the scheme in which multiple 
transmissions occur simultaneously. Our simulations used the 
ordinary algorithm, which requires <d(N) sub-channel uses. 

With respect to total transmit energy, hierarchical averaging 
performs best so long as the phases are fixed, in which case the 
performance is on par order-wise with the lower bound. When 
phases are uniform, however, performance depends on N. 
Even though path averaging has better scaling than hierarchical 
averaging under uniform phase, for small N hierarchical av- 
eraging requires less power. Finally, as expected, randomized 
gossip requires the most energy in any regime. 



VI. Quantization 

In this section we examine consensus with quantization. 
As in the case with ideal links, we first characterize the 
performance of existing quantized consensus algorithms with 
respect to the metrics specified in Section IH-CI We cannot 
survey every approach in the literature, so we focus on the 
quantized consensus of 1211 . in which consensus is modified to 
preserve the average of quantized estimates each round. After 
deriving bounds on its performance, we turn to hierarchical 
averaging. We show that it achieves the lower bound of 
Theorem [2] when phases are fixed. 



A. Quantized Consensus 

In ordinary gossip, the primary difficulty of quantization 
is that quantizing estimates in general alters the average 
across the network. Thus, even if consensus is achieved, the 
dynamics will not in general converge on the true average of 
the (quantized) measurements. In quantized consensus |2T), a 
family of consensus algorithms is proposed that preserves the 
average at each round; it converges to near-consensus around 
the true average. 

Recall from Section lll-Cl that Z is the set of L points evenly 
distributed across [0, 1), separated by quantization bin width 
A = l/L. Quantized consensus operates only on quantized 
values, so first we must quantize the real-valued measurements 
z„(0): 

q n (0) = 0MO)), (54) 



where <fi is the dithered quantizer described in Section III-CI 
Let e„(0) = 4>(z n (0)) — z n (Q) denote the quantization error. 

Much like in randomized gossip, at each round every node 
randomly selects a neighboring node and mutually averages, 
with the caveat that one node rounds "up" to the nearest 
member of Z while the other rounds "down." Letting i and j 
denote the two nodes in the exchange, we haveTj 



«(*) 
<&(*) 



g< (t-l) + g 3 -(t-l) 
2 

qi {t - 1) + qj (t - 1) 



(55) 



(56) 



where \-~\z ar, d \_-\z represent rounding up and down to the 
nearest element of Z, respectively. In ||2T1 Theorem 1] This 
algorithm is guaranteed to converge on near-consensus: in the 
limit, each q n (t) differs by at most a single bin, and the sum 
of the quantized measurements is preserved. It is difficult to 
bound the convergence speed of this process in general due 
to the non-linearity of the updates. However, for the case of 
a fully-connected graph, in [21 Lemma 6] it is shown that 
quantized consensus requires il(N 2 ) transmissions over fi(iV) 
consensus rounds. Using this fact, we can bound the overall 
performance. 

Theorem 6: The performance of quantized gossip scales, 
with high probability, as 

T = Q(N), (57) 

B = fl(N log N), (58) 

E = il{N 2 - a/2+x (\ogN) a/2 ) 7 and (59) 

a 2 = niN- 2 *- 1 ), (60) 



for x > 0. 

Proof: Choose an alphabet size L, which we allow to 
vary with N. Then, in order to maintain connectivity, we need 
to have links with signal-to-noise ratio 7 = &(L) at radius 
A/log N/N, which requires 



(61) 



5 In fact, [21 ] proposes a family of algorithms, and the one we use here is 
only one possibility. The convergence properties we exploit in the following 
are independent of the specific algorithm chosen. 
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From ||2T1 Lemma 6] we have that consensus requires fl(N) 
rounds for fully-connected graphs, and the performance for 
random graphs cannot be any better. As in the proof of 
unquantized randomized gossip, the neighborhood size scales 
as 0(logiV), so the time-bandwidth product scales as 

B = fl(N log N). (62) 

Since we require il(N 2 ) total transmissions, we have 

E = n(LN 2 - a/2 (\ogN) a/2 ). (63) 

Finally, we examine the mean-squared error. In the best 
case, the dynamics converge on true consensus, meaning that 
q n (T) is the same for each n. In this case the final estimates 
are merely the average of the quantized measurements z n (0). 
Therefore the final estimates are 

1 N 

n=l 



1 



n=l 

N 



— z &vc ^ f e„(0), 

71=1 

where e„(0) is the quantization error of the initial estimate. 
In the worst case, each |e„(0)| = A/2 = L~ 1 /2. Since the 
errors are uncorrected, the squared error follows 

1 N V 1 

71=1 / 



= E 



N 



>^12 E ^m 2 } 

71=1 

= N- 1 L~ 2 /2 

= niN^L- 2 ). 

Choosing L = N x gives the result. ■ 
We hasten to point out that the bounds here are rather 
generous, since we supposed that convergence on a random 
graph is as fast as on a fully-connected graph. In practice, 
as we shall see in the numerical results presented later, the 
performance is somewhat worse. 



B. Hierarchical Averaging 

We characterize the performance of hierarchical averaging 
with quantization. As before, cells of nodes at lower layers 
achieve local consensus, after which they broadcast their 
estimates to nearby clusters, continuing the process until 
global consensus is achieved. Here, however, each estimate 
is quantized prior to transmission, which introduces error that 
accumulates during consensus. 

As in the previous subsection, we employ the dithered quan- 
tizer. For the uniform quantization alphabet with cardinality L, 
let the quantized version of the estimate z n (t) be denoted 



We can write each quantized value as 

q n (t) = Zn(t) + V n (t), 



(65) 



where each v n {t) is uniform over [—A/2, A, 2) and indepen- 
dent for every n, t. 

We choose T = |~log 4 A fl ~ K ] and define the cells Cjk(t) as 
before. At time slot t = 1, each node n quantizes its initial 
measurement z n (0) and broadcasts the quantized value to the 
nodes in C(n, 1). Following d39"T l. this requires 



P n (l) = 0(LN^-^ a ^ 2 ), 



(66) 



where the dependence on L arises since 7 = Q(L) and L, 
and therefore 7, may depend on N. Each node n updates its 
estimate by averaging the quantized estimates in its cluster: 



(67) 



mGC(n,l) 



j^=- *m(°)+«m(0). (68) 



41-tjv 

meC(n,l) 

As before we use the normalization factor 1/4 1_T A^ in order 
to avoid nodes' needing to know the cluster cardinalities. Next, 
at time slot 2 < t < T, each cluster at layer t—1 quantizes its 
estimate and cooperatively broadcasts to the members of its 
parent cluster at layer t. Following (l4TT i and d42l . this requires 

_ j 0( J L4("/2-2)t A T-a/2+ K (a/2-2)^ for fixed phafJes 

~ 1 ( jL 4("/2-i)«7v(^-i)"/2) ; f or uniform phases 

(69) 

At time step t = 2, each node n averages together the 
estimates from each of the subclusters C(m,t — 1) C C(n,t), 
giving us 



Z «( 2 ) = \ 12 VCirnM 1 ) 

C(m,l)CC(n,2) 

= J 12 Z C(m,l)( 1 ) +%(m,l)(l) 



C(m,l)CC(n,2) 



\ 12 \^TN 12 *fc(0) + «*(0))+1*n(l) 

C(m,l)CC(n,2) \ feSC(m,l) / 



12 ( z k(0)+v k (0)) + \ 12 «c( m ,i)(l). 

feSC(n,2) C(m,l)CC(n,2) 

Continuing by induction, at arbitrary round t the estimate is 

1 ^ *" 1 



n(*) = 4^ 12 (^(0)+«fc(0)) + E 12 4S "^m( S ), 



a=l M£R n (t,s) 



(70) 



g«(t) = 4>(*n(t)). 



(64) 



where R n (t, s) is the set of all clusters C(m, s) that are subsets 
of C(n, t). In other words, at round t we have the total average 
so far, corrupted by quantization noise from each of the rounds 

s < t. 

In the following theorem, we detail the resource-estimate 
tradeoff achieved by this scheme. 

Theorem 7: Using dithered quantization, hierarchical aver- 
aging achieves the following tradeoff between resource con- 
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sumption and estimation error with high probability: 

T = B = 0(N K ) 

' 0{N 1 - a / 2 + Ka / 2+x ), for fixed phases 
0(N Ka / 2+x ), for uniform phases 



E = 



-2x\ 



(71) 
(72) 
(73) 



for any x > 0, k > 0, and 2 < a < 4. In particular, for x = 
we obtain an estimation error that is constant in the network 
size using the same amount of energy as in the non-quantized 
case. 



C. Numerical Results 

We examine the empirical performance of the quantized 
consensus discussed. We also run simulations for randomized 
gossip, employing dithered quantization to accommodate the 
finite-rate links. We again choose 7 = lOdB, k = 0, and we 
again let N run from 10 to 1000 and average performance 
over 20 initializations, but here we choose a — 2. Choosing 
7 constant means that the quantization error A is constant in 
N, and the minimum quantization error is itself constant. In 
Figure [3] we plot the energy E against the mean-square error 



Proof: Choose an alphabet size L. Since the number of 
rounds and the cluster geometry is unchanged from the non- 
quantized case, we can repeat the argument from Theorem 
[5] yielding T = B = 0(N K ). Since the transmit power is 
changed only by a factor of L, we can repeat the arguments 
from Theorem [5] which yields 

_ io(LN 1 - a ^ +Ka ^), for fixed phases 
1 0(LN Ka / 2 ), for uniform phases 



All that remains is to bound the estimation error. Evaluating 
(l70l for t = T, we get, for every n 



10- 



T-l 



s=l MeR n (t,s) 



fc=l 

T-l 

= ^avc E 4 S - T U M (s). 

s=0 MGfl„(t,s) 

The mean squared estimation error is therefore 



a z =E 



T-l 



s=0 Mefl„(t,s) 



T-l 



= £ E 4 2 cs-^[K(*)i 2 ], 

s=0 M£R n (t,s) 

where the equality is due to the independence of the quanti- 
zation error terms. Since each vm(s) is uniformly distributed 
across [-A, A), we have E[\v m {2)\ 2 } = 8(A 2 ) = 9(L- 2 ). 
Therefore, we have 

a 2 = Q(L- 2 )J2 J2 42(S ~ T) 

s=0 MeR„((,s) 

= e(L- 2 )J24 T -H 2 ^ 

s=0 
T-l 



9(L- 2 ), 



1 - 4 J 
1-4 



since 4 T = Q(N), Choosing L = N x yields the result. 
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Fig. 3. Total energy E and mean-square error a 2 for several quantized 
consensus algorithms. 



The energy expenditure for hierarchical averaging is consis- 
tent with theory, although we note that uniform phase results in 
higher expenditure than fixed phase, even though the scaling 
laws are the same. The energy expenditure for randomized 
gossip increases roughly linearly in N, suggesting that the 
energy burden with fixed 7 is similar to the non-quantized 
case. As expected, quantized consensus performs worse than 
predicted by Theorem [6] The energy consumption is on par 
with randomized gossip, but it accrues estimation error as 
N increases. The other schemes have bounded or decreasing 
error. 

VII. Conclusion 

In this paper we have studied consensus from an explicitly 
wireless perspective, defining a path-loss model, confronting 
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interference, and defining resource consumption in terms of 
energy, time, and bandwidth. We have shown that, while 
existing consensus algorithms such as gossip may be order- 
optimal with respect to the amount of energy required to 
achieve consensus, they are strictly suboptimal with respect 
to the time and bandwidth required. Additionally, we have 
proposed hierarchical averaging, an approach to consensus 
derived explicitly for wireless. For free-space propagation, or 
for any 2 < a < 4 and with fixed channel phase, hierarchical 
averaging is nearly order optimal with respect to all three 
metrics simultaneously. We also examined the effects of quan- 
tization. Using dithered quantization, we showed that, without 
expending any additional energy over the non-quantized case, 
hierarchical averaging suffers only from bounded estimation 
error in the size of the network. Therefore, hierarchical aver- 
aging appears to be an efficient, robust approach to consensus 
over wireless networks. 

However, in constructing a wireless model in which analysis 
is tractable, we made several simplifying assumptions. In 
particular, we supposed that nodes are synchronized, that 
out-of-range nodes do not interfere with other nodes, and 
that channel gains are completely determined by the path- 
loss model. Future work involves exploring the effect of 
relaxing these assumptions. In the case of synchronization, 
one could incorporate the costs of medium-access techniques 
such as CSMA. In the case of interference, one could model 
neighborhoods according to a signal-to-/nferference-plus-noise 
ratio and derive optimal scaling laws. In the case of channel 
gains, the effects of fading and outage merit study. We expect 
that, since it is cooperative in nature, hierarchical averaging is 
particularly robust to outage. 

Finally, improvements to hierarchical averaging are possi- 
ble. In the case of uniform phase, hierarchical averaging was 
suboptimal with respect to transmit energy for a > 2. This is 
due to the fact that cooperative transmissions do not combine 
coherently at the receivers. In fl2l . in addition to transmitter 
cooperation, distributed receiver cooperation is used to secure 
order-optimal performance. It is possible that a similar ap- 
proach will secure order-optimal performance in consensus, 
regardless of channel phases or path-loss exponents. 
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